Sublinear Algorithms for Earth Mover ' s Distance

نویسنده

  • Terry P. Orlando
چکیده

We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, A], with sample complexities independent of domain size permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing our testers to be optimal in their dependence on these parameters. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual L 1 metric, and give optimal algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : 0 90 4 . 02 92 v 1 [ cs . D S ] 2 A pr 2 00 9 Sublinear Time Algorithms for Earth Mover ’ s Distance

We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0,∆], with sample complexities independent of domain size – permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other para...

متن کامل

2 00 9 Sublinear Time Algorithms for Earth Mover ’ s Distance

We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0,∆], with sample complexities independent of domain size – permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other para...

متن کامل

Improved Approximation Algorithms for Earth-Mover Distance in Data Streams

For two multisets S and T of points in [∆], such that |S| = |T | = n, the earth-mover distance (EMD) between S and T is the minimum cost of a perfect bipartite matching with edges between points in S and T , i.e., EMD(S, T ) = minπ:S→T ∑ a∈S ||a−π(a)||1, where π ranges over all one-to-one mappings. The sketching complexity of approximating earth-mover distance in the two-dimensional grid is men...

متن کامل

Sketching Earth-Mover Distance on Graph Metrics

We develop linear sketches for estimating the Earth-Mover distance between two point sets, i.e., the cost of the minimum weight matching between the points according to some metric. While Euclidean distance and Edit distance are natural measures for vectors and strings respectively, Earth-Mover distance is a well-studied measure that is natural in the context of visual or metric data. Our work ...

متن کامل

Research in Algorithms for Geometric Pattern Matching MIT 2001 - 06 Progress Report : January 1 , 2002 – June 30 , 2002

During the period of January-June 2002, the main focus of this project was implementing and evaluating algorithms for embedding Earth-Mover Distance into the Euclidean space. Earth-mover distance (EMD) is a recently proposed metric for computing distance between features of images (see [EMD] and references therein). It was experimentally verified to capture well the perceptual notion of a diffe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009